Quantifying the Effects of Correlated Covariates on Variable Importance Estimates from Random Forests
نویسندگان
چکیده
QUANTIFYING THE EFFECTS OF CORRELATED COVARIATES ON VARIABLE IMPORTANCE ESTIMATES FROM RANDOM FORESTS By Ryan Vincent Kinies A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University. Virginia Commonwealth University, 2006 Major Director: Kellie J. Archer, Ph.D. Assistant Professor, Department of Biostatistics Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the
منابع مشابه
Comparison of Survival Forests in Analyzing First Birth Interval
Background and objectives: Application of statistical machine learning methods such as ensemble based approaches in survival analysis has been received considerable interest over the past decades in time-to-event data sets. One of these practical methods is survival forests which have been developed in a variety of contexts due to their high precision, non-parametric and non-linear nature. This...
متن کاملThe Performance of small samples in quantifying structure central Zagros forests utilizing the indexes based on the nearest neighbors
Abstract Todaychr('39')s forest structure issue has converted to one of the main ecological debates in forest science. Determination of forest structure characteristics is necessary to investigate stands changing process, for silviculture interventions and revival operations planning. In order to investigate structure of the part of Ghale-Gol forests in Khorramabad, a set of indices such as Cla...
متن کاملBeta - Binomial and Ordinal Joint Model with Random Effects for Analyzing Mixed Longitudinal Responses
The analysis of discrete mixed responses is an important statistical issue in various sciences. Ordinal and overdispersed binomial variables are discrete. Overdispersed binomial data are a sum of correlated Bernoulli experiments with equal success probabilities. In this paper, a joint model with random effects is proposed for analyzing mixed overdispersed binomial and ordinal longitudinal respo...
متن کاملggRandomForests: Survival with Random Forests
Random Forests (Breiman 2001) (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by fitting an ensemble of trees to stabilize model estimates. Random Forests for survival (Ishwaran and Kogalur 2007; Ishwaran, Kogalur, Blackstone, and Lauer 2008) ...
متن کاملZeileis Danger : High Power ! – Exploring the Statistical Properties of a Test for Random Forest Variable
Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., in genetics and bioinformatics. We highlight both advantages and limitations of different variable importance scores and associated testing procedures, especially in the context of correlated pred...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006